问题回答(QA)对知识库(KBS)的挑战是充满挑战的,因为所需的推理模式多样化,本质上是无限的,类型的推理模式。但是,我们假设以大型KB为基础,以回答各自子图中各个实体的查询类型所需的推理模式。利用不同子图的本地社区之间的这种结构相似性,我们引入了一个半参数模型(cbr-subg),(i)一个非参数组件,每个查询,每个查询,都会动态检索其他类似的$ k $ - $ - $ - $ - near-neart-tebrienk(KNN)培训查询以及查询特定的子图和(ii)训练的参数组件,该参数分量可以从KNN查询的子图中识别(潜在的)推理模式,然后将其应用于目标查询的子图。我们还提出了一种自适应子图收集策略,以选择特定于查询的compact子图,从而使我们可以扩展到包含数十亿个事实的完整freebase kb。我们表明,CBR-SUBG可以回答需要子图推理模式的查询,并在几个KBQA基准上的最佳模型竞争性能。我们的子图收集策略还会产生更多紧凑的子图(例如,webQSP的尺寸减小55 \%,而将答案召回的召回率增加4.85 \%)\ footNote {代码,模型和子码头可在\ url {https://github.com上获得。 /rajarshd/cbr-subg}}。
translated by 谷歌翻译
从头开始解决复杂问题通常是有挑战性的,但如果我们可以访问其解决方案的其他类似问题,则更容易 - 一种称为基于案例的推理(CBR)的范式。我们提出了一种神经象征性的CBR方法(CBR-KBQA),用于在大知识库上应答。 CBR-KBQA由非参数内存组成,该内存存储案例(问题和逻辑表单)和参数模型,该参数模型可以通过检索与其相关的案例来为新问题生成逻辑表单。在包含复杂问题的几个KBQA数据集上,CBR-KBQA实现了竞争性能。例如,在ComplexWebQuestions数据集上,CBR-KBQA以11 \%的准确度优于当前最新状态。此外,我们表明CBR-KBQA能够使用新案例\ EMPH {没有}任何进一步的培训:通过在案例存储器中纳入一些人类标记的示例,CBR-KBQA能够成功地生成包含未经看线KB实体的逻辑表格以及关系。
translated by 谷歌翻译
Managing novelty in perception-based human activity recognition (HAR) is critical in realistic settings to improve task performance over time and ensure solution generalization outside of prior seen samples. Novelty manifests in HAR as unseen samples, activities, objects, environments, and sensor changes, among other ways. Novelty may be task-relevant, such as a new class or new features, or task-irrelevant resulting in nuisance novelty, such as never before seen noise, blur, or distorted video recordings. To perform HAR optimally, algorithmic solutions must be tolerant to nuisance novelty, and learn over time in the face of novelty. This paper 1) formalizes the definition of novelty in HAR building upon the prior definition of novelty in classification tasks, 2) proposes an incremental open world learning (OWL) protocol and applies it to the Kinetics datasets to generate a new benchmark KOWL-718, 3) analyzes the performance of current state-of-the-art HAR models when novelty is introduced over time, 4) provides a containerized and packaged pipeline for reproducing the OWL protocol and for modifying for any future updates to Kinetics. The experimental analysis includes an ablation study of how the different models perform under various conditions as annotated by Kinetics-AVA. The protocol as an algorithm for reproducing experiments using the KOWL-718 benchmark will be publicly released with code and containers at https://github.com/prijatelj/human-activity-recognition-in-an-open-world. The code may be used to analyze different annotations and subsets of the Kinetics datasets in an incremental open world fashion, as well as be extended as further updates to Kinetics are released.
translated by 谷歌翻译
Developing and least developed countries face the dire challenge of ensuring that each child in their country receives required doses of vaccination, adequate nutrition and proper medication. International agencies such as UNICEF, WHO and WFP, among other organizations, strive to find innovative solutions to determine which child has received the benefits and which have not. Biometric recognition systems have been sought out to help solve this problem. To that end, this report establishes a baseline accuracy of a commercial contactless palmprint recognition system that may be deployed for recognizing children in the age group of one to five years old. On a database of contactless palmprint images of one thousand unique palms from 500 children, we establish SOTA authentication accuracy of 90.85% @ FAR of 0.01%, rank-1 identification accuracy of 99.0% (closed set), and FPIR=0.01 @ FNIR=0.3 for open-set identification using PalmMobile SDK from Armatura.
translated by 谷歌翻译
Most action recognition datasets and algorithms assume a closed world, where all test samples are instances of the known classes. In open set problems, test samples may be drawn from either known or unknown classes. Existing open set action recognition methods are typically based on extending closed set methods by adding post hoc analysis of classification scores or feature distances and do not capture the relations among all the video clip elements. Our approach uses the reconstruction error to determine the novelty of the video since unknown classes are harder to put back together and thus have a higher reconstruction error than videos from known classes. We refer to our solution to the open set action recognition problem as "Humpty Dumpty", due to its reconstruction abilities. Humpty Dumpty is a novel graph-based autoencoder that accounts for contextual and semantic relations among the clip pieces for improved reconstruction. A larger reconstruction error leads to an increased likelihood that the action can not be reconstructed, i.e., can not put Humpty Dumpty back together again, indicating that the action has never been seen before and is novel/unknown. Extensive experiments are performed on two publicly available action recognition datasets including HMDB-51 and UCF-101, showing the state-of-the-art performance for open set action recognition.
translated by 谷歌翻译
Vision language (VL) models like CLIP are robust to natural distribution shifts, in part because CLIP learns on unstructured data using a technique called caption supervision; the model inteprets image-linked texts as ground-truth labels. In a carefully controlled comparison study, we show that caption-supervised CNNs trained on a standard cross-entropy loss (with image labels assigned by scanning captions for class names) can exhibit greater distributional robustness than VL models trained on the same data. To facilitate future experiments with high-accuracy caption-supervised models, we introduce CaptionNet (https://github.com/penfever/CaptionNet/), which includes a class-balanced, fully supervised dataset with over 50,000 new human-labeled ImageNet-compliant samples which includes web-scraped captions. In a series of experiments on CaptionNet, we show how the choice of loss function, data filtration and supervision strategy enable robust computer vision. We also provide the codebase necessary to reproduce our experiments at VL Hub (https://github.com/penfever/vlhub/).
translated by 谷歌翻译
受生物神经元的启发,激活功能在许多现实世界中常用的任何人工神经网络的学习过程中起着重要作用。文献中已经提出了各种激活功能,用于分类和回归任务。在这项工作中,我们调查了过去已经使用的激活功能以及当前的最新功能。特别是,我们介绍了多年来激活功能的各种发展以及这些激活功能的优势以及缺点或局限性。我们还讨论了经典(固定)激活功能,包括整流器单元和自适应激活功能。除了基于表征的激活函数的分类法外,还提出了基于应用的激活函数的分类法。为此,对MNIST,CIFAR-10和CIFAR-100等分类数据集进行了各种固定和自适应激活函数的系统比较。近年来,已经出现了一个具有物理信息的机器学习框架,以解决与科学计算有关的问题。为此,我们还讨论了在物理知识的机器学习框架中使用的激活功能的各种要求。此外,使用Tensorflow,Pytorch和Jax等各种机器学习库之间进行了不同的固定和自适应激活函数进行各种比较。
translated by 谷歌翻译
深度神经网络(DNN)在学习指纹的固定长度表示方面表现出了不可思议的希望。由于表示学习通常集中在捕获特定的先验知识(例如细节)上,因此没有普遍的表示可以全面地封装在指纹中的所有歧视性信息。在学习一系列表示的过程中可以缓解这个问题,但需要解决两个关键的挑战:(i)如何从相同的指纹图像中提取多种不同的表示? (ii)如何在匹配过程中最佳利用这些表示形式?在这项工作中,我们在输入图像的不同转换上训练多个Deepprint(一种基于DNN的指纹编码器)的多个实例,以生成指纹嵌入的集合。我们还提出了一种功能融合技术,该技术将这些多个表示形式提炼成单个嵌入,该技术忠实地捕获了合奏中存在的多样性而不会增加计算复杂性。已在五个数据库中进行了全面评估所提出的方法,这些数据库包含滚动,普通和潜在的指纹(NIST SD4,NIST SD14,NIST SD14,NIST SD27,NIST SD302和FVC2004 DB2A)和统计上的显着改进,在验证范围内已始终如一地证明以及封闭式和开放设定的标识设置。提出的方法是能够提高任何基于DNN识别系统的准确性的包装器。
translated by 谷歌翻译
集成开发环境(IDE)提供工具支持,以自动化许多源代码编辑任务。传统上,IDE仅使用空间上下文,即开发人员正在编辑的位置来生成候选编辑建议。但是,仅空间上下文通常不足以自信地预测开发人员的下一个编辑,因此IDE在某个位置会产生许多建议。因此,IDE通常不会主动提供建议,而是需要单击特定图标或菜单,然后从大量潜在建议列表中进行选择。结果,开发人员通常会错过使用工具支持的机会,因为他们不知道它存在或忘记使用它。为了更好地理解开发人员行为中的常见模式并产生更好的编辑建议,我们还可以使用时间上下文,即开发人员最近执行的编辑。为了启用基于时间上下文的编辑建议,我们提出了《守望先锋》,这是一种从IDE中执行的开发人员编辑痕迹学习编辑序列模式的新颖技术。我们的实验表明,《守望先锋》具有78%的精度,守望先锋不仅完成了开发人员错过使用IDE工具支持的机会,而且还预测了在IDE中没有工具支持的新编辑。
translated by 谷歌翻译
结肠镜检查的柔性内窥镜由于其固有的复杂性而产生了一些局限性,导致患者不适和缺乏临床医生的直觉。机器人设备和自主控制代表了一种可行的解决方案,以减少内镜医生的工作量和训练时间,同时改善整体程序结果。自主内窥镜控制的先前工作使用启发式政策,将其概括限制在非结构化和高度可变形的结肠环境中,需要频繁进行人类干预。这项工作提出了一种基于图像的内窥镜控制,使用深钢筋学习,称为深度视觉运动控制(DVC),以在结肠道的复杂部分中表现出适应性行为。 DVC学习内窥镜图像与内窥镜的控制信号之间的映射。对20位专家胃肠道内镜医生进行的首次用户研究是为了将其导航性能与使用现实的虚拟模拟器进行比较的DVC策略。结果表明,DVC在几个评估参数上显示出同等的性能,更安全。此外,与最先进的启发式控制政策相比,对20名新手参与者进行了第二次用户研究,以证明人类的监督更容易。对结肠镜检查程序的无缝监督将使干预主义者能够专注于医疗决策,而不是内窥镜的控制问题。
translated by 谷歌翻译